Skip to content

remove protobuf for oTel ingestion #1406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 14, 2025

Conversation

nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Aug 14, 2025

Summary by CodeRabbit

  • New Features

    • Clear error returned when attempting to ingest OTEL data in protobuf format in the OSS edition.
    • Exposed APIs to initialize OTEL streams and a validator module for easier integrations.
    • Made the log source header key publicly available for external use.
  • Refactor

    • Unified OTEL ingestion flow with explicit Content-Type handling; JSON payloads supported, protobuf rejected with a clear message.
  • Chores

    • Removed deprecated protobuf decoding paths and related size checks to simplify ingestion logic.

Copy link
Contributor

coderabbitai bot commented Aug 14, 2025

Walkthrough

Removes OTEL Protobuf ingestion support and unifies OTEL content handling into a single function that processes JSON and rejects Protobuf. Updates handlers to use the new signature, exposes setup_otel_stream publicly, makes LOG_SOURCE_KEY public, and exposes the validator module via lib.rs.

Changes

Cohort / File(s) Summary of Changes
OTEL ingestion refactor
src/handlers/http/ingest.rs
Removed Protobuf decoding and size checks; added unified process_otel_content handling Content-Type (supports JSON, rejects Protobuf); switched to known field lists; updated OTEL handlers to new signature; made setup_otel_stream public; removed unused imports/constants.
Public API exposure
src/handlers/mod.rs, src/lib.rs
Made LOG_SOURCE_KEY a pub const; exposed validator module by changing mod validator to pub mod validator.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant HTTP Handler
  participant OTEL Processor
  participant Storage

  Client->>HTTP Handler: OTEL ingest request (headers + body)
  HTTP Handler->>OTEL Processor: process_otel_content(Content-Type, body, known fields)
  alt Content-Type = JSON
    OTEL Processor->>Storage: flatten_and_push_logs(parsed JSON)
    OTEL Processor-->>HTTP Handler: Success/Result
  else Content-Type = Protobuf
    OTEL Processor-->>HTTP Handler: Error: Protobuf not supported
  end
  HTTP Handler-->>Client: Response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

for next release

Suggested reviewers

  • parmesant

Poem

I nibbled bytes and followed trails,
From Protobuf hills to JSON vales.
A single stream now guides my hop,
With headers checked—I bounce, I stop.
Public paths unveiled, I thump with glee,
Carrots committed, logs set free. 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nikhilsinhaparseable nikhilsinhaparseable marked this pull request as ready for review August 14, 2025 10:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (3)
src/handlers/http/ingest.rs (3)

176-183: Panic risk: avoid unwrap() on header value parsing

to_str().unwrap() can panic on invalid header bytes. Propagate a well-formed error instead.

-    let log_source = LogSource::from(log_source.to_str().unwrap());
+    let log_source = LogSource::from(
+        log_source
+            .to_str()
+            .map_err(|_| PostError::Invalid(anyhow::anyhow!("Invalid {} header", LOG_SOURCE_KEY)))?
+    );

238-279: Content-Type equality is too strict; accept parameters and fix error messages

Many clients send application/json; charset=utf-8. The current equality check will reject these valid requests. Also, error messages should not advertise protobuf as acceptable given it’s explicitly unsupported now.

Apply this diff to robustly parse Content-Type and align error messages with protobuf removal:

-    match req
-        .headers()
-        .get("Content-Type")
-        .and_then(|h| h.to_str().ok())
-    {
-        Some(content_type) => {
-            if content_type == CONTENT_TYPE_JSON {
+    match req.headers().get("Content-Type").and_then(|h| h.to_str().ok()) {
+        Some(raw_content_type) => {
+            // Accept parameters like `; charset=utf-8`
+            let content_type = raw_content_type.split(';').next().unwrap_or("").trim();
+            if content_type == CONTENT_TYPE_JSON {
                 flatten_and_push_logs(
-                    serde_json::from_slice(&body)?,
+                    serde_json::from_slice(&body)?,
                     stream_name,
                     log_source,
                     &p_custom_fields,
                 )
                 .await?;
-            } else if content_type == CONTENT_TYPE_PROTOBUF {
+            } else if content_type == CONTENT_TYPE_PROTOBUF {
                 return Err(PostError::Invalid(anyhow::anyhow!(
-                    "Protobuf ingestion is not supported in Parseable OSS"
+                    "Protobuf ingestion is not supported in Parseable OSS"
                 )));
             } else {
                 return Err(PostError::Invalid(anyhow::anyhow!(
-                    "Unsupported Content-Type: {}. Expected application/json or application/x-protobuf",
-                    content_type
+                    "Unsupported Content-Type: {}. Expected application/json",
+                    raw_content_type
                 )));
             }
         }
         None => {
             return Err(PostError::Invalid(anyhow::anyhow!(
-                "Missing Content-Type header. Expected application/json or application/x-protobuf"
+                "Missing Content-Type header. Expected application/json"
             )));
         }
     }

238-279: No payload size guard: potential DoS via large OTEL JSON bodies

The prior 10MB guard appears to have been removed with the protobuf path. Add a conservative upper bound for JSON bodies to avoid memory spikes.

 async fn process_otel_content(
     req: &HttpRequest,
     body: web::Bytes,
     stream_name: &str,
     log_source: &LogSource,
 ) -> Result<(), PostError> {
     let p_custom_fields = get_custom_fields_from_header(req);
 
+    // Guard against excessively large payloads (10 MB)
+    const MAX_OTEL_JSON_BYTES: usize = 10 * 1024 * 1024;
+    if body.len() > MAX_OTEL_JSON_BYTES {
+        return Err(PostError::Invalid(anyhow::anyhow!(
+            "Payload too large: {} bytes (limit: {} bytes)",
+            body.len(),
+            MAX_OTEL_JSON_BYTES
+        )));
+    }
+
     match req
         .headers()
         .get("Content-Type")
         .and_then(|h| h.to_str().ok())
     {

If a global limit/config already exists elsewhere, wire it here instead of a hardcoded constant.

🧹 Nitpick comments (3)
src/lib.rs (1)

51-51: Confirm intent of expanding public API surface (pub mod validator)

Making validator public is a breaking surface change. If external crates aren’t meant to depend on internal validation details, consider pub(crate) or re-exporting only the required items instead of the entire module.

Would you like me to prepare a targeted re-export (e.g., pub use validator::{StreamNameValidationError, stream_name};) to minimize the public surface?

src/handlers/http/ingest.rs (2)

166-171: Visibility: consider pub(crate) for setup_otel_stream unless needed externally

If this helper is only used within this crate, prefer pub(crate) to avoid unnecessary public API expansion. If you intend SDKs or integrators to call this directly, keep it public.

-pub async fn setup_otel_stream(
+pub(crate) async fn setup_otel_stream(

192-235: Minor: setup_otel_stream returns LogSourceEntry but it’s unused by callers

Consider returning only (String, LogSource) or reusing the entry in the caller to avoid unused data. This reduces allocations and clarifies intent.

-) -> Result<(String, LogSource, LogSourceEntry), PostError> {
+) -> Result<(String, LogSource), PostError> {
@@
-    Ok((stream_name, log_source, log_source_entry))
+    Ok((stream_name, log_source))

…and remove the unused binding at the call sites.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 574e3b1 and a1ba2d7.

📒 Files selected for processing (3)
  • src/handlers/http/ingest.rs (7 hunks)
  • src/handlers/mod.rs (1 hunks)
  • src/lib.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/handlers/http/ingest.rs (1)
src/validator.rs (1)
  • stream_name (33-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
🔇 Additional comments (5)
src/handlers/mod.rs (1)

28-28: Making LOG_SOURCE_KEY public looks good

Exposing LOG_SOURCE_KEY aligns with the new OTEL flow that validates/reads this header across modules. No issues spotted.

src/handlers/http/ingest.rs (4)

38-40: Good: switching to known field lists per OTEL signal

Using OTEL_LOG_KNOWN_FIELD_LIST/OTEL_METRICS_KNOWN_FIELD_LIST/OTEL_TRACES_KNOWN_FIELD_LIST simplifies stream setup and avoids protobuf-specific flattening logic. This aligns well with the PR objectives.


288-295: Good: OTEL logs path now flows through common setup and content processing

The refactor reduces duplication and centralizes validation. Assuming the content-type check is relaxed as suggested, this should be robust for typical agents.

Also applies to: 296-296


308-315: Good: OTEL metrics path parity with logs

Mirrors the logs path and enforces log-source alignment via known fields. Looks consistent.

Also applies to: 316-316


328-335: Good: OTEL traces path parity with logs/metrics

Same comments as above. Centralizing content processing simplifies future maintenance.

Also applies to: 336-336

@nitisht nitisht merged commit ed19109 into parseablehq:main Aug 14, 2025
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants